chore: update the vertexai inference impl to use openai-python for openai-compat functions #3377

mattf · 2025-09-08T17:23:25Z

What does this PR do?

update VertexAI inference provider to use openai-python for openai-compat functions

Test Plan

$ VERTEX_AI_PROJECT=... uv run llama stack build --image-type venv --providers inference=remote::vertexai --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py
...

i don't have an account to test this. get_api_key may also need to be updated per https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai

mattf · 2025-09-08T17:23:46Z

@leseb here you go

…enai-compat functions

OpenAIMixin expects to use an API key and creates its own AsyncOpenAI client. So our code now authenticate with the Google service, retrieves a token and pass it to the OpenAI client. Falls back to an empty string if credentials can't be obtained (letting LiteLLM handle ADC directly). Signed-off-by: Sébastien Han <[email protected]>

leseb · 2025-09-10T13:23:30Z

Test plan:

GOOGLE_APPLICATION_CREDENTIALS=/Users/leseb/Documents/AI/llama-stack/service_account.json  VERTEX_AI_PROJECT=assisted-installer uv run llama stack build --image-type venv --providers inference=remote::vertexai --run

LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py
Uninstalled 1 package in 5ms
Installed 1 package in 2ms
============================================= test session starts ==============================================
platform darwin -- Python 3.12.8, pytest-8.4.1, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0', 'hydra-core': '1.3.2'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0, hydra-core-1.3.2
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 27 items                                                                                             

tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=vertexai/vertex_ai/gemini-2.5-flash-inference:completion:sanity] SKIPPED [  3%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=vertexai/vertex_ai/gemini-2.5-flash-inference:completion:suffix] SKIPPED [  7%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=vertexai/vertex_ai/gemini-2.5-flash-inference:completion:sanity] SKIPPED [ 11%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=vertexai/vertex_ai/gemini-2.5-flash-1] SKIPPED [ 14%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vertexai/vertex_ai/gemini-2.5-flash] SKIPPED [ 18%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:non_streaming_01] PASSED [ 22%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_01] PASSED [ 25%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_01] SKIPPED [ 29%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-True] PASSED [ 33%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-True] PASSED [ 37%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=vertexai/vertex_ai/gemini-2.5-flash] SKIPPED [ 40%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=vertexai/vertex_ai/gemini-2.5-flash-0] SKIPPED [ 44%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:non_streaming_02] PASSED [ 48%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_02] PASSED [ 51%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_02] SKIPPED [ 55%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-False] PASSED [ 59%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-False] PASSED [ 62%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:non_streaming_01] PASSED [ 66%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_01] PASSED [ 70%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_01] SKIPPED [ 74%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-True] PASSED [ 77%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-True] PASSED [ 81%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:non_streaming_02] PASSED [ 85%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_02] PASSED [ 88%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_02] SKIPPED [ 92%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-False] PASSED [ 96%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-False] PASSED [100%]

=========================================== short test summary info ============================================
SKIPPED [3] tests/integration/inference/test_openai_completion.py:46: Model vertexai/vertex_ai/gemini-2.5-flash hosted by remote::vertexai doesn't support OpenAI completions.
SKIPPED [3] tests/integration/inference/test_openai_completion.py:104: Model vertexai/vertex_ai/gemini-2.5-flash hosted by remote::vertexai doesn't support vllm extra_body parameters.
SKIPPED [4] tests/integration/inference/test_openai_completion.py:83: Model vertexai/vertex_ai/gemini-2.5-flash hosted by remote::vertexai doesn't support n param.
SKIPPED [1] tests/integration/inference/test_openai_completion.py:110: Model vertexai/vertex_ai/gemini-2.5-flash hosted by remote::vertexai doesn't support chat completion calls with base64 encoded files.
================================= 16 passed, 11 skipped, 2 warnings in 31.95s ==================================

mattf · 2025-09-10T13:32:53Z

@leseb lgtm. thanks for finishing it.

…enai-compat functions (llamastack#3377) # What does this PR do? update VertexAI inference provider to use openai-python for openai-compat functions ## Test Plan ``` $ VERTEX_AI_PROJECT=... uv run llama stack build --image-type venv --providers inference=remote::vertexai --run ... $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py ... ``` i don't have an account to test this. `get_api_key` may also need to be updated per https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai --------- Signed-off-by: Sébastien Han <[email protected]> Co-authored-by: Sébastien Han <[email protected]>

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 8, 2025

This was referenced Sep 8, 2025

Deprecate non-OpenAI inference endpoint #2365

Open

Standardize Inference Providers to Use OpenAIMixin #3387

Open

mattf and others added 2 commits September 10, 2025 15:22

chore: update the vertexai inference impl to use openai-python for op…

2f18194

…enai-compat functions

leseb force-pushed the use-openai-for-vertexai branch from 73e99b6 to b9961c8 Compare September 10, 2025 13:22

leseb marked this pull request as ready for review September 10, 2025 13:24

leseb requested review from ashwinb, yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb, bbrowning, reluctantfuturist and slekkala1 as code owners September 10, 2025 13:24

leseb approved these changes Sep 10, 2025

View reviewed changes

leseb merged commit 0e27016 into llamastack:main Sep 10, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: update the vertexai inference impl to use openai-python for openai-compat functions #3377

chore: update the vertexai inference impl to use openai-python for openai-compat functions #3377

Uh oh!

mattf commented Sep 8, 2025

Uh oh!

mattf commented Sep 8, 2025

Uh oh!

leseb commented Sep 10, 2025

Uh oh!

mattf commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

chore: update the vertexai inference impl to use openai-python for openai-compat functions #3377

chore: update the vertexai inference impl to use openai-python for openai-compat functions #3377

Uh oh!

Conversation

mattf commented Sep 8, 2025

What does this PR do?

Test Plan

Uh oh!

mattf commented Sep 8, 2025

Uh oh!

leseb commented Sep 10, 2025

Uh oh!

mattf commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!